An Illuminating Counterexample 
Michael Hardy 

Suppose that Xi, . . . ,X„ are independent random variables with a normal (or "Gaus- 
sian") distribution with expectation /i and variance o"^. A statistician who has observed the 
values of Xi, . . . , X^ must guess the values of and o"^. Among the statistically naive, it is 
sometimes asserted that 
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where X = (Xi + ■ ■ ■ + X„)/n, is a better estimator of cr^ than is 
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because S"^ is "unbiased" and is "biased." That means E^S"^) = ^ E(T'^), i.e., an 
"unbiased estimator" is a statistic whose expected value is the quantity to be estimated. 

The goodness of an estimator is sometimes measured by the smallness of its "mean 
squared error," defined as E (([estimator] — [quantity to be estimated])^). By that criterion 
the biased estimator would be better than the unbiased estimator S"^, since 



E{{T'-ay) < E{{S'-ay), 

but the difference is so slight that no one's statistical conscience is horrified by anyone's 
preferring 5*^ over T^. Besides, the smallness of the mean squared error as a criterion for 
evaluating estimators is not necessarily sacred anyway. 

A more damning example, well-known among statisticians, is described in p. 168]. We 
have X ~ Poisson(A), so that P(X = x) = X^e'^/xl for x = 0, 1, 2, ... , and P(X = 0)^ = 
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Figure 1: D ^ {{x,y) : x'^ + y"^ ^ 1} 



e ^•^ is to be estimated. Any unbiased estimator 5{X) satisfies 
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uniformly in A > 0. Clearly the only such function is 5{x) — (—1)^. Thus, if it is observed 
that X = 200, so that it is astronomically implausible that e~'^^ is anywhere near 1, the 
desideratum of unbiasedness nonetheless requires us to use (—1)^°° = 1 as our estimate of 
e~'^^. And if X = 3 is observed, the situation is even more absurd: we must use (—1)^ = —1 
as an estimate of a quantity that we know to be in the interval (0, 1). A far better estimator 
of e"^^ is the biased estimator e~^^ (which is the answer given by the well-known method 
of maximum likelihood). 

Here is a different counterexample, which the visually inclined may find even more hor- 
rifying. A light source is at an unknown location // somewhere in the disk D — {{x,y) : 
a;^ + < 1 } in the Euclidean plane (see Figure 1). A dart thrown at the disk strikes 
some random location U in the disk, casting a shadow at a point X on the boundary. The 
random variable U is uniformly distributed in the disk, i.e., the probability that it is within 
any particular region is proportional to the area of the region. The boundary is a translu- 
cent screen, so that an observer located outside of the disk can see the location X of the 
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shadow, but cannot see where either the hght source or the opaque object is. Given only 
that information — the location X of the shadow — the location fi of the light source must be 
guessed. 

A common-sense approach to guessing might proceed as follows: Before we observe 
the shadow, our information is invariant under rotations, and so should be our estimate. 
Therefore, we use in as our prior (i.e., pre-data) estimate. Then, when we observe X, 
since X is more likely to be far from the light source than close to it, we adjust our estimate 
by moving it away from the shadow. Because the amount of information in the shadow is 
small, we don't move it very far. We get an estimator of the form cX with c < 0, but c is 
not very much less than 0. 

If we insist on unbiasedness, we must choose c so that E{cX) — /j, uniformly in /i. To 
think about that, we first express the problem in polar coordinates. Write /i — p(cos </?, sin (f) 
and X — (cos©, sin 6). 

Proposition: The probability distribution of the random angle Q is given by 

p(dg)^ ^-^T^^-^^ d^. (1) 

ZTT 

Prom this proposition it follows that E{X) — —jJi/l. Therefore, our unbiased estimator 
is cX — —2X, which is always absurdly remote from the D, by a full radius! 

Proof of the Proposition. A simplification will follow from the observation that the way 
in which the probability distribution P{d9) depends on is both rotation-equivariant and 
affine. That it is affine means that if the probability distribution of is P^j,{d9) when the 
hght source is at ji then Paix+{i-a)v{d9) — aP^{d9) + (1 — a)Pv{d9) for any value of a for 
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Figure 2: 



which a/i + (1 — a)z/ remains within the disk. (An affine mapping is one that preserves hnear 
combinations in which the sum of the coefficients is 1; a hnear combination satisfying that 
constraint is a "affine combination.") To see that this mapping is affine, consider Figure 2. 
The area between ji and the arc from A to B is the sum of the area of the triangle jiAB 
and the area of the region bounded by the arc AB and the secant hne AB. As moves, the 
area bounded by the arc and the secant hne remains constant and the area of the triangle 
depends on ji in an affine fashion. The desired "affinity" follows. 

Rotation-equivariance reduces the problem to finding the probability distribution when 
ji is between (0, 0) and (1, 0). "Affinity" reduces it from there to the problem of finding the 
probability distribution when // is at either of those two points. 

If /X = (0, 0), the probability distribution of O is clearly uniform on the interval from 
to 27r, i.e., it is d9/{27r). If // = (1, 0), then for < ^ < 27r we have 



p(o < e < 9) 



area between arc and straight line from (1, 0) to (cos 9, sin 9) 9 — sin 9 



area of disk 27r 



Differentiation yields 



P{d9) 



1 — cos 9 



d9. 
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If /i = (p, 0) then by "affinity" we have 

, .d9 (l — cos9)d9 1 — pcos6',^ 

Rotation-equivariance then gives (|l]). ■ 

The Bayesian approach to statistical inference assigns probabihties, not to events that 
are random, according to their relative frequencies of occurrence, but to propositions that 
are uncertain, according to the degree to which known evidence supports them. Accordingly, 
we can regard the location /i of the light source as uniformly distributed in the disk, and 
then use the conditional expected location E{fi\X) as an estimator of p. Equation (|I|) gives 
the conditional distribution of O given p; the marginal (i.e., "unconditional") distribution 
of /i = p{cosip, simp) is given by 

p dp d(p 
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(2) 



The joint distribution of (p, 6) is the product of (P and (||^ 

(1 — pcos{6 — ip))pdpdip dO 



(3) 



27r2 

The conditional distribution of p = p{cosip,smip) given that G = ^ comes from regarding 
function p and (f with 9 fixed and normalizing: 

p{dp, d^\e = e) = i^-p^o<o-v))pdpd^ _ 

constant 

Integration shows that the "constant" is vr. Finally, we get 

r r t ■ ,i-pcos(0-<p) , , 

h[p\X) = / p[cos(p,sm(p) p dp dip 

Jo Jo 71" 

= -(cose, sin e)/4 = -X/4, 
which is an eminently reasonable estimator under the circumstances. 
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